Supervised Semi-definite Embedding for Email Data Cleaning and Visualization

نویسندگان

  • Ning Liu
  • Fengshan Bai
  • Jun Yan
  • Benyu Zhang
  • Zheng Chen
  • Wei-Ying Ma
چکیده

The Email systems are playing an important and irreplaceable role in the digital world due to its convenience, efficiency and the rapid growth of World Wide Web (WWW). However, most of the email users nowadays are suffering from the large amounts of irrelevant and noisy emails everyday. Thus algorithms which can clean both the noise features and the irrelevant emails are highly desired. In this paper, we propose a novel Supervised Semi-definite Embedding (SSDE) algorithm to reduce the dimension of email data so as to leave out the noise features of them and visualize these emails in a supervised manner to find the irrelevant ones intuitively. Experiments on a set of received emails of several volunteers during a period of time and some benchmark datasets show the comparable performance of the proposed SSDE algorithm.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SUPERVISED SEMI−DEFINITE EMBEDING FOR IMAGE MANIFOLDS (WedPmPO1)

Semi−definite Embedding (SDE) is a recently proposed to maximize the sum of pair wise squared distances between outputs while the input data and outputs are locally isometric, i.e. it pulls the outputs as far apart as possible, subject to unfolding a manifold without any furling or fold for unsupervised nonlinear dimensionality reduction. The extensions of SDE to supervised feature extraction, ...

متن کامل

Parametric Embedding for Class Visualization

We propose a new method, parametric embedding (PE), that embeds objects with the class structure into a low-dimensional visualization space. PE takes as input a set of class conditional probabilities for given data points and tries to preserve the structure in an embedding space by minimizing a sum of Kullback-Leibler divergences, under the assumption that samples are generated by a gaussian mi...

متن کامل

Effective semi-supervised nonlinear dimensionality reduction for wood defects recognition

Dimensionality reduction is an important preprocessing step in high-dimensional data analysis without losing intrinsic information. The problem of semi-supervised nonlinear dimensionality reduction called KNDR is considered for wood defects recognition. In this setting, domain knowledge in forms of pairs constraints are used to specify whether pairs of instances belong to the same class or diff...

متن کامل

Semi-supervised Convolutional Neural Networks for Text Categorization via Region Embedding

This paper presents a new semi-supervised framework with convolutional neural networks (CNNs) for text categorization. Unlike the previous approaches that rely on word embeddings, our method learns embeddings of small text regions from unlabeled data for integration into a supervised CNN. The proposed scheme for embedding learning is based on the idea of two-view semi-supervised learning, which...

متن کامل

Optimal Neighborhood Preserving Visualization by Maximum Satisfiability

We present a novel approach to low-dimensional neighbor embedding for visualization, based on formulating an information retrieval based neighborhood preservation cost function as Maximum satisfiability on a discretized output display. The method has a rigorous interpretation as optimal visualization based on the cost function. Unlike previous lowdimensional neighbor embedding methods, our form...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005